Goto

Collaborating Authors

 anchor box





A Data-Driven RetinaNet Model for Small Object Detection in Aerial Images

Tang, Zhicheng, Tang, Jinwen, Shang, Yi

arXiv.org Artificial Intelligence

In the realm of aerial imaging, the ability to detect small objects is pivotal for a myriad of applications, encompassing environmental surveillance, urban design, and crisis management. Leveraging RetinaNet, this work unveils DDR-Net: a data-driven, deep-learning model devised to enhance the detection of diminutive objects. DDR-Net introduces novel, data-driven techniques to autonomously ascertain optimal feature maps and anchor estimations, cultivating a tailored and proficient training process while maintaining precision. Additionally, this paper presents an innovative sampling technique to bolster model efficacy under limited data training constraints. The model's enhanced detection capabilities support critical applications including wildlife and habitat monitoring, traffic flow optimization, and public safety improvements through accurate identification of small objects like vehicles and pedestrians. DDR-Net significantly reduces the cost and time required for data collection and training, offering efficient performance even with limited data. Empirical assessments over assorted aerial avian imagery datasets demonstrate that DDR-Net markedly surpasses RetinaNet and alternative contemporary models. These innovations advance current aerial image analysis technologies and promise wide-ranging impacts across multiple sectors including agriculture, security, and archaeology.


Clustering-based Feature Representation Learning for Oracle Bone Inscriptions Detection

Tao, Ye, Fu, Xinran, Pang, Honglin, Yang, Xi, Li, Chuntao

arXiv.org Artificial Intelligence

Oracle Bone Inscriptions (OBIs), play a crucial role in understanding ancient Chinese civilization. The automated detection of OBIs from rubbing images represents a fundamental yet challenging task in digital archaeology, primarily due to various degradation factors including noise and cracks that limit the effectiveness of conventional detection networks. To address these challenges, we propose a novel clustering-based feature space representation learning method. Our approach uniquely leverages the Oracle Bones Character (OBC) font library dataset as prior knowledge to enhance feature extraction in the detection network through clustering-based representation learning. The method incorporates a specialized loss function derived from clustering results to optimize feature representation, which is then integrated into the total network loss. We validate the effectiveness of our method by conducting experiments on two OBIs detection dataset using three mainstream detection frameworks: Faster R-CNN, DETR, and Sparse R-CNN. Through extensive experimentation, all frameworks demonstrate significant performance improvements.


AeroLite-MDNet: Lightweight Multi-task Deviation Detection Network for UAV Landing

Yang, Haiping, Liu, Huaxing, Wu, Wei, Chen, Zuohui, Wu, Ning

arXiv.org Artificial Intelligence

--Unmanned aerial vehicles (UA Vs) are increasingly employed in diverse applications such as land surveying, material transport, and environmental monitoring. Following missions like data collection or inspection, UA Vs must land safely at docking stations for storage or recharging, which is an essential requirement for ensuring operational continuity. However, accurate landing remains challenging due to factors like GPS signal interference. T o address this issue, we propose a deviation warning system for UA V landings, powered by a novel vision-based model called AeroLite-MDNet. This model integrates a multiscale fusion module for robust cross-scale object detection and incorporates a segmentation branch for efficient orientation estimation. We introduce a new evaluation metric, A verage Warning Delay (A WD), to quantify the system's sensitivity to landing deviations. Furthermore, we contribute a new dataset, UA VLand-Data, which captures real-world landing deviation scenarios to support training and evaluation. Experimental results show that our system achieves an A WD of 0.7 seconds with a deviation detection accuracy of 98.6%, demonstrating its effectiveness in enhancing UA V landing reliability. NMANNED aerial vehicles (UA Vs), also known as drones, have been widely used in fire detection, geological hazard monitoring, and dangerous behavior monitoring [1] for their agility, compactness, and cost-efficiency. To reduce the dependency of UA Vs on human labor and skills, UA V nests are widely used to minimize manual operations, allowing the UA Vs to perform autonomous monitoring. UA V nests also offer functionalities such as safe parking, charging, data transmission, routine maintenance, repairs, and communication relays [2].


SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection

Zhang, Huaxiang, Zhang, Hao, Mei, Aoran, Gan, Zhongxue, Zhu, Guo-Niu

arXiv.org Artificial Intelligence

Detection Transformer-based methods have achieved significant advancements in general object detection. However, challenges remain in effectively detecting small objects. One key difficulty is that existing encoders struggle to efficiently fuse low-level features. Additionally, the query selection strategies are not effectively tailored for small objects. To address these challenges, this paper proposes an efficient model, Small Object Detection Transformer (SO-DETR). The model comprises three key components: a dual-domain hybrid encoder, an enhanced query selection mechanism, and a knowledge distillation strategy. The dual-domain hybrid encoder integrates spatial and frequency domains to fuse multi-scale features effectively. This approach enhances the representation of high-resolution features while maintaining relatively low computational overhead. The enhanced query selection mechanism optimizes query initialization by dynamically selecting high-scoring anchor boxes using expanded IoU, thereby improving the allocation of query resources. Furthermore, by incorporating a lightweight backbone network and implementing a knowledge distillation strategy, we develop an efficient detector for small objects. Experimental results on the VisDrone-2019-DET and UAVVaste datasets demonstrate that SO-DETR outperforms existing methods with similar computational demands. The project page is available at https://github.com/ValiantDiligent/SO_DETR.


RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection

Balakrishnan, Thennarasi, Sengar, Sandeep Singh

arXiv.org Artificial Intelligence

Object detection algorithms particularly those based on YOLO have demonstrated remarkable efficiency in balancing speed and accuracy. However, their application in brain tumour detection remains underexplored. This study proposes RepVGG-GELAN, a novel YOLO architecture enhanced with RepVGG, a reparameterized convolutional approach for object detection tasks particularly focusing on brain tumour detection within medical images. RepVGG-GELAN leverages the RepVGG architecture to improve both speed and accuracy in detecting brain tumours. Integrating RepVGG into the YOLO framework aims to achieve a balance between computational efficiency and detection performance. This study includes a spatial pyramid pooling-based Generalized Efficient Layer Aggregation Network (GELAN) architecture which further enhances the capability of RepVGG. Experimental evaluation conducted on a brain tumour dataset demonstrates the effectiveness of RepVGG-GELAN surpassing existing RCS-YOLO in terms of precision and speed. Specifically, RepVGG-GELAN achieves an increased precision of 4.91% and an increased AP50 of 2.54% over the latest existing approach while operating at 240.7 GFLOPs. The proposed RepVGG-GELAN with GELAN architecture presents promising results establishing itself as a state-of-the-art solution for accurate and efficient brain tumour detection in medical images.


Guarantee Regions for Local Explanations

Havasi, Marton, Parbhoo, Sonali, Doshi-Velez, Finale

arXiv.org Artificial Intelligence

Interpretability methods that utilise local surrogate models (e.g. LIME) are very good at describing the behaviour of the predictive model at a point of interest, but they are not guaranteed to extrapolate to the local region surrounding the point. However, overfitting to the local curvature of the predictive model and malicious tampering can significantly limit extrapolation. We propose an anchor-based algorithm for identifying regions in which local explanations are guaranteed to be correct by explicitly describing those intervals along which the input features can be trusted. Our method produces an interpretable feature-aligned box where the prediction of the local surrogate model is guaranteed to match the predictive model. We demonstrate that our algorithm can be used to find explanations with larger guarantee regions that better cover the data manifold compared to existing baselines. We also show how our method can identify misleading local explanations with significantly poorer guarantee regions.


Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box

Zhang, Hao, Xu, Cong, Zhang, Shuaijie

arXiv.org Artificial Intelligence

With the rapid development of detectors, Bounding Box Regression (BBR) loss function has constantly updated and optimized. However, the existing IoU-based BBR still focus on accelerating convergence by adding new loss terms, ignoring the limitations of IoU loss term itself. Although theoretically IoU loss can effectively describe the state of bounding box regression,in practical applications, it cannot adjust itself according to different detectors and detection tasks, and does not have strong generalization. Based on the above, we first analyzed the BBR model and concluded that distinguishing different regression samples and using different scales of auxiliary bounding boxes to calculate losses can effectively accelerate the bounding box regression process. For high IoU samples, using smaller auxiliary bounding boxes to calculate losses can accelerate convergence, while larger auxiliary bounding boxes are suitable for low IoU samples. Then, we propose Inner-IoU loss, which calculates IoU loss through auxiliary bounding boxes. For different datasets and detectors, we introduce a scaling factor ratio to control the scale size of the auxiliary bounding boxes for calculating losses. Finally, integrate Inner-IoU into the existing IoU-based loss functions for simulation and comparative experiments. The experiment result demonstrate a further enhancement in detection performance with the utilization of the method proposed in this paper, verifying the effectiveness and generalization ability of Inner-IoU loss. Code is available at https://github.com/malagoutou/Inner-IoU.